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Abstract 

Although  nonparametric  regression  has  traditionally  focused  on  the  estimation  of 
conditional  mean  functions,  nonparametric  estimation  of  conditional  quantile  functions  is 
often  of  substantial  practical  interest.  We  explore  a  class  of  quantile  smoothing  splines, 
which  are  defined  as  solutions  to 

.1 


min  B>,(y/ -*(*/))  + XCj"  \s"^)\p^)X'p 

2  e  G  J0 


g  6  9 

with  px(u)  =  (x-I(u  <  0))w  and  p>\.  For  the  particular  choices  p=\  and  p=°°  and 

g=  [geCl[0,l]'-  (J  \g'\x)\pdx)1/p<°o},  we  show  that  solutions,  g,  are  parabolic 
splines,  i.e.  piecewise  quadratics,  on  the  mesh  [x\,  ...,  xn),  and  may  be  computed  by 
standard  /  j  -type  linear  programming  techniques.  At  X  =  0,  g  interpolates  the  Xth  quan- 
tiles  at  the  distinct  design  points,  and  for  X  sufficiently  large  g  is  the  linear  regression 
quantile  fit  (Koenker  and  Bassett(1978))  to  the  observations.  In  fact,  the  entire  path  of 
solutions,  in  the  penalty  parameter  X,  may  be  efficiently  computed  by  parametric  linear 
programming  methods.  For  a  somewhat  more  general  class  of  quantile  smoothing  splines 
we  establish  that  (Ex(g(X)-go(X))-)  =Op(n  (logn)  ),  is  achievable,  under  mild 
conditions  on  the  true  conditional  quantile  function  of  Y  given  X.  Finally  we  note  that 
the  approach  may  be  easily  adapted  to  impose  monotonicity  and/or  convexity  constraints 
on  the  fitted  function. 


Some  Key  Words:  Robust  Nonparametric  Regression,  Quantiles,  Smoothing  Splines, 
Penalized  Likelihood,  Parametric  Linear  Programming,  Sieve  Estimators. 


1.  INTRODUCTION 

Several  authors  have  recently  proposed  methods  for  nonparametric  estimation  of 
conditional  quantile  functions:  Troung  (1989)  following  the  pioneering  work  of  Stone 
(1977)  on  nearest  neighbor  methods,  Chaudhuri(1991),  Samanta  (1989)  and  Antoch  and 
Janssen  (1989)  using  kernel  methods,  and  White  (1991)  employing  neural  networks. 
Hendricks  and  Koenker  (1992)  discuss  regression  spline  models  and  apply  them  to  elec- 
tricity demand  data.  D.R.  Cox  and  M.C.  Jones  in  the  discussion  of  Cole(1988),  reviving 
a  suggestion  of  Bloomfield  and  Steiger  (1983),  have  recently  proposed  estimating  quan- 
tile smoothing  splines  which  minimize 

TPx(yi-g(Xi))  +  X\(g'\x))2dx 

where  pT(«)  =  (x  -  I(u  <  0))w  is  the  Czech  function  of  Koenker  and  Bassett  (1978).  Here 
the  parameter  xe  [0,  1]  controls  the  quantile  of  interest,  while  X  €  R+  controls  the 
smoothness  of  the  resulting  estimate,  thus  generalizing  the  extensive  literature  on  li 
smoothing  splines  pioneered  by  Wahba  (1990).  This  is  an  intriguing  idea,  and  has  also 
been  mentioned,  for  example,  in  Cox(1983),  Eubank  (1988)  and  Utreras  (1981)  in  the 
median  Pi/2(«)  =  Vi  \u\  case.  However,  the  resulting  quadratic  program  poses  some 
serious  computational  obstacles.  Obviously  the  computational  virtues  of  the  piecewise 
linear  form  of  the  first  term  of  the  objective  function  are  sacrificed  by  the  quadratic  form 
of  the  smoothness  penalty. 

One  is  thus  naturally  led  to  ask:  "Why  not  replace  (g"(x))  in  the  penalty  by 
\g"{x)\V  The  median  special  case  of  this  problem  has  been  studied  in  a  remarkable 
paper  by  Schuette  (1978)  in  the  actuarial  literature.  We  will  show,  expanding  on 
Schuette's  discrete  version  of  the  problem  using  finite  differences,  that  minimizing  (1.1) 
retains  the  linear  programming  form  of  the  parametric  version  of  the  quantile  regression 
problem  and  yields  solutions  which  are  easy  to  compute. 


1.1.  TheL!  Roughness  Penalty 

Given  observations  {(v,-,  xf) :  /  =  1,  ...,  n }  with  0  <X\  <  ...  <xn  <  1  consider  the 
problem  of  minimizing 

Rz.x[g}=ZPx(yi-g(Xi))  +  ^l\8"^)\dx  (i.i) 

over  g  €  g  ,  the  space  of  continuous  functions  on  [0,  1]  with  continuous  first  derivative 
and  absolutely  integrable  second  derivitive. 

Definition.  A  function  g  :  [0,  1]  — >R  is  a  parabolic  spline  with  mesh 
0  =  xq  <  x  i  <  ...  <  xn  <  xn+\  =  1  if  g  e  g  and  g  (x)  is  piecewise  quadratic  in  the  intervals 
[X(,  JC/+i),  i  =  0,  ...,  n,  that  is  ,  g  has  the  form 

g  (X)  =  OLi(x  -  Xj)2  +  p;(*  -  Xi)  +  Ji       Xj<X  <  xM         i  =  0,  ...,  n  (1.2) 

Theorem  1.1.  There  exists  a  parabolic  spline  g  which  solves  (1.1). 

Proof.  Suppose  g  solves  (1.1).  We  will  show  that  there  exists  a  parabolic  spline  g  such 
that  R[g]=  R[g].    Suppose,  provisionally,  that  sgn(#"(jt))  is  constant  on  the  intervals 


[Xj,  Xj+i),  i  =  0,  ...,  n,  so  we  may  write 

J1  \g"(x)\dx  =  '/2£  |£'(x<+i)-*'C*/)l  (1.3) 

0  /=0 

Note  that  we  may  set  a,-  =  (g'fo+i)  -  g'(Xi))/(xi+l  -x,)  for  i  =  1,  ...,  n-1  and  determine 
the  remaining  coefficients  of  g  from  the  conditions 

iC*/+)  =  Yi  =g(*i+)  i  =  U  ....  »-l 

and 

£'(*;+)  =  P/=s'(x/+)       '  =  1,  ....  i-l 

The  parabolic  spline  g  thus  constructed  is  in  Cl[0,  1],  since  g  was,  and  clearly  achieves 
the  same  value  of  R.  Finally,  note  that  if  g"  changes  sign  on  an  interval  between  knots, 
say  [xi,  jc(+i),  the  same  construction  and  the  fact  that  j\f  |  >  J/ implies  that  the  resulting 
g  satisfies  R  [g]  <  R[g]  which  contradicts  our  hypothesis  that  g"  could  change  sign.    □ 

Having  established  the  form  of  the  solution  to  (1.1),  it  is  straightforward  to  develop 
an  algorithm  to  compute  g.  Using  (1.2)  the  penalty  becomes 

i\g"(x)\dx  =  2nZhi\ai\. 


1 


;=i 


where  hx  =jc,+i  - xx,  i  =  1,  ...,  n-\.  This  enables  us  to  express  the  problem  (1.1)  as  a 
/ 1  -type  linear  program.  A  number  of  important  features  of  the  solution  are  immediately 
apparent  from  the  fact  that  the  problem  is  a  linear  program.  See  Koenker  and  Ng(1992) 
for  algorithmic  details.  Since  we  have  (n+\)  free  parameters  and  (2n  -  1)  pseudo- 
observations;  solutions  must  have  n+\  residuals  which  are  zero  (by  complementary 
slackness)  and  in  our  case  tfiese  zero  residuals  correspond  to  either  (i)  exact  interpolation 
of  an  observation,  so  yt  =  y(-  or  (ii)  linearity  of  g  in  a  particular  subinterval  of  the  design 
mesh,  so  a,  =  0  for  some  i.  Obviously,  the  parameter  X  controls  the  comparative  "advan- 
tage" of  these  two  alternatives.  When  X  is  sufficiently  large  all  the  a,  will  be  zero  and  the 
solution  will  correspond  to  the  bivariate  linear  regression  quantile  fit.  When  X  is 
sufficiently  small  all  n  observations  will  be  interpolated,  and  all  but  one  of  the  a,'s  can 
be  non-zero. 

As  in  any  smoothing  problem,  choice  of  "bandwidth",  here  represented  by  the 
parameter  X,  is  critical.  For  quantile  smoothing  splines,  this  problem  is  ameliorated  by 
the  fact  that  the  whole  family  of  solutions  to  (1.1)  for  X  e  [0,  °°)  may  be  easily  found  by 
parametric  linear  programming.  An  important  implication  of  this  fact  is  that  we  may  ini- 
tially solve  the  simpler  linear  quantile  regression  problem  corresponding  to  X  =  °°  and 
gradually  relax  the  roughness  penalty  with  a  sequence  of  simplex  pivots,  thus  avoiding  a 
direct  solution  of  the  potentially  rather  large  problem.  Each  transition  to  a  new  solution 
involves  a  single  simplex  pivot  of  an  extremely  sparse  constraint  matrix,  and  hence  solv- 
ing (1.1)  for  a  broad  range  of  A.  quite  efficient.  An  interesting  aspect  of  the  way  that  solu- 
tions g(X)  depend  upon  the  penalty  parameter  X  involves  the  number  of  interpolated 
points.  In  the  classical  smoothing  spline  literature  much  has  been  made  of  the  "effective 
dimensionality"  or  "degrees  of  freedom"  of  the  estimated  curves  corresponding  to  various 
X.  Such  measures  of  dimensionality  are  usually  based  on  the  trace  of  various  quasi- 
projection  matrices  in  the  least-squares  theory.    See  e.g.  Buja,  Hastie  and  Tibshirani 


(1989)  for  an  extensive  discussion.  For  the  quantile  smoothing  spline  the  connection  is 
more  direct  in  the  sense  that  there  is  an  explicit  trade-off  between  the  number  of  interpo- 
lated points  and  the  number  of  linear  segments.  Since  "reasonable"  smoothing  suggests 
that  the  number  of  interpolated  points  is  small  relative  to  n,  it  is  probably  sensible  to  start 
the  parametric  programming  at  the  linear  quantile  regression  solution  rather  than  at  X  =  0. 
If  the  design  is  in  "general  position"  so  no  two  observations  share  the  same  design  point, 
there  must  be  at  least  2  and  at  most  n  interpolated  y/'s.  Call  this  number  p\.  Clearly,  p\ 
is  a  plausible  measure  of  the  effective  dimension  of  the  fitted  model  with  penalty  param- 
eter X,  and  n  —p\  +  1,  which  corresponds  to  the  number  of  linear  segments  in  the  fitted 
function,  is  a  plausible  measure  of  the  degrees  of  freedom  of  the  fit.  Such  decomposi- 
tions might  be  used  in  conjunction  with  the  function  R  [g]  itself  to  implement  data-driven 
bandwidth  choice,  for  example,  along  the  lines  of  Schwarz  (1978). 


1.2.  The  Loo  Penalty 

If  we  replace  the  L\  roughness  penalty  with  the  LM  penalty,  we  have  the  objective 
function 

n 

RxX[g]=  IPtC^/  - g(xt))  +  Xs\iy\g"(x)\  (1.4) 

1=1  x 

which  we  would  like  to  minimize  over  gs  Q.  Again  we  can  characterize  the  solution  as  a 
quadratic  spline. 

Theorem  1.2.  There  exists  a  parabolic  spline  which  solves  (1.4). 

Proof.  Suppose  g  solves  (1.4).  Let  g  be  of  the  form  (1.2)  with  (cc/,P;,Y/)'s  defined 
exacdy  as  in  the  proof  of  Theorem  2.1.  Clearly,  geC1  and  has  the  same  "fidelity  to  the 
data"  as  g.  That  g  isn't  rougher  than  g  in  IHU  follows  immediately  from 

sup  \g'\x)\h,  >  J  \g"(x)\dx  >  |  \g"{x)dx  |  =  sup  \g"(x)\hh 

for  Ai  =  [jc,-,jc/+1),  1=1,  •  •  •  ,/j-1.  □ 

This  characterization  of  the  solution  allows  us  to  rewrite  the  penalty  as 

|irCx)|L=2max|a,|. 

We  can  now  parametrize  the  g  as  in  the  previous  case  and  formulate  (1.4)  as  a  linear  pro- 
gram, but  the  penalty  is  now  relegated  to  role  of  linear  inequality  constraints.  With  the 
L  i  penalty,  as  we  have  seen,  a  direct  tradeoff  between  the  number  of  interpolated 
responses  and  the  number  of  linear  segments  between  design  observations  existed.  With 
the  Loo  penalty  this  tradeoff  is  altered.  Active  constraints  now  correspond  not  to  (X/=0 
but  to  |oc;  |=A,  the  upper  bound  for  g"\  thus  the  tradeoff  is  between  segments  that  attain 
the  prescribed  bound,  and  observations  which  are  interpolated.  Of  course,  in  the  limiting 
case  A=0  the  solution  is,  as  with  the  L  {  penalty  with  A.=°°,  the  linear  xth  regression  quan- 
tile estimate.  Clearly,  the  L  i  penalty  favors  a  piecewise  linear  form  for  g  with  a  few 
sharp  elbows  where  g"  can  be  very  large.  The  Loo  penalty  enforces  a  uniform  bound  on 


g"  and  thus  straightens  the  elbows  and  introduces  a  modest  curvature  in  the  longer  seg- 
ments to  compensate. 

1.3.  Extensions 

In  many  practical  applications  there  is  an  immediate  question  of  extending  these 
methods  to  multivariate  settings.  The  additive  spline  models  of  Buja,  Hastie  and 
Tibshirani  (1989)  and  others  naturally  suggest  themselves.  Clearly  the  nonlinear  charac- 
ter of  the  present  smoothers  vitiate  the  attractive  iterative  "backfitting"  algorithms  avail- 
able in  the  /2-case.  But  feasible  estimators  may  still  be  possible  using  a  limited  number 
of  simplex  pivots  from  an  initial  linear  (in  covariates)  quantile  function  estimate. 

There  are  a  number  of  intriguing  extensions  incorporating  further  constraints. 
Monotonicity  and  convexity  of  the  fitted  function  g  may  be  readily  imposed  by  simply 
imposing  further  linear  inequality  constraints  on  the  parameters  of  the  problem.  The  final 
section  includes  examples  of  montonically  constrained  estimates.  While  adding  such 
inequality  constraints  to  the  corresponding  1 2  problem  results  in  a  significant  increase  in 
complexity  adding  linear  inequality  constraints  to  the  quantile  smoothing  spline  prob- 
lems does  not  alter  the  fundamental  nature  of  the  optimization  problem  to  be  solved. 

In  situations  in  which  the  derivitives  of  go  need  to  be  estimated  it  may  be 
worthwhile  to  consider  higher  order  derivitive  penalties  as  is  sometimes  done  in  the  clas- 
sical smoothing  spline  literature.  This  might  be  the  case  in  the  growth  curve  analysis  of 
Cole(1988)  for  example. 


2.  ASYMPTOTICS 

To  explore  the  asymptotic  behavior  of  quantile  smoothing  splines  we  will  assume 
that  the  observations  on  (Y;,Xi)  pairs  are  independent  and  identically  distributed.  We 
would  like  to  estimate  the  xth  conditional  quantile  function  of  Y  \X  which  we  will  denote 

go(x)  =  F-Y\x(x\X=x) 

We  will  assume  further  that  goe  g=  [geCl[0,\]  |sup|g"|  <<»}  and  that  the  conditional 
density  exists  and  is  strictly  bounded  away  from  0  and  °°  when  evaluated  at  go(x),  that  is, 
there  exists  m  >  0  and  M  <  «>  such  that  for  all  x  in  [0, 1]  we  have 

m  <fY\X(go(x))^M. 

The  lower  bound  is  an  identifiability  condition  insuring  the  uniqueness  of  go,  while  the 
upper  bound  is  required  to  exclude  pathologically  rapid  convergence.  We  will  consider  a 
somewhat  more  general  class  of  estimators  which  solve 


min£pT(y/-g(*/)) 


where 


gn  =  {geg\g(x)=£lcjBj(x),  j|£"|<A„}, 

and  the  functions  [Bj(x)\  i=\,...,pn}  constitute  a  B-spline  basis  for  the  parabolic  splines 
on  the  uniform  mesh  (0,0,0, hn,2hn,  •••  ,1,1,1},  where  hn  =p~l ,  see  e.g.  deBoor(1978). 


The  choice  of  the  restricted  B-spline  definition  of  gn  is  partly  for  theoretical  conveni- 
ence, but  it  also  has  an  important  practical  rationale.  The  B-spline  formulation  of  (1.1) 
is  considerably  easier  to  compute  than  the  full  smoothing  spline  formulation  when  n  is 
very  large.  It  represents  a  compromise  between  the  classical  smoothing  spline  and  the 
more  restrictive  regression  splines,  see  Wahba(1990).  Note  that  there  is  a  direct  connec- 
tion between  the  choice  of  A„  in  the  definition  of  Qn  and  the  X  parameter  in  the  previous 
section.  Our  objective  will  be  to  explore  conditions  on  pn,  and  A„  which  insure  con- 
sistency of  g„  in  theZo-norm, 

\\gn-g0\\2^(Ex(gn(X)-g0(X))2)V\ 

Theorem  2.  If  pn  =0(nl/5/\ogn)  and  A  =  0(\ogn),  then  under  the  conditions  of  the 
preceeding  paragraph, 

\\gn-go\\2  =  Op(n-2/5(\ogn)2). 

Proof.  We  will  sketch  the  proof  which  relies  heavily  on  recent  work  of  Shen  and 
Wong(1992)  on  convergence  rates  for  sieve  estimators.  In  Shen  and  Wong's  notation  the 
contribution  of  the  /th  observation  to  the  pseudo-likelihood  (negative  fidelity)  in  our  case 
is 

Hg(x),y)  =  -px(y-g(x)). 

As  in  Bassett  and  Koenker(1986),  for  g  (x)>go(x),  we  have 

EY\xlPx(Y-g)-px<r-go)]=    I  FY\x(y)-*)dy 

go(x) 

while  for  g(x)<go(x),  the  sign  and  limits  of  integration  are  reversed.  Thus,  using  the  den- 
sity bound,  there  exists  a  8>0  such  that  for  any  y  satisfying  |.y-goCO  I  -  8, 


and  therefore 


and 


\FY]X(y)-x\  >V2m\y-g0(x)\, 
Ey \xi?x(Y-g)  -  ?x(X-g0)]  >m\g-g0\2, 


inf      EY  lx[px(Y-g)  -  px(Y-go)]  >  me2, 


This  verifies  Condition  CI  of  Shen  and  Wong  with  oc=l.  Their  Condition  C2  is  trivially 
satisfied  with  (5=1  since 

\px(Y-g)-Px(Y-go)\<\g-go\- 

Let  H(e,A)  =  \ogK(z,A),  denote  the  e-entropy  of  the  set  A  where  A"(e,,4),  denotes 
the  number  of  IHU  balls  of  radius  e  required  to  cover  A.  To  verify  Condition  C3  of  Shen 
and  Wong  we  must  compute  H{z,^n)  where 

7n  =  ipx(y~g)  ~  Px(y-go)'-ge  Qn)- 

Since  the  set  gn  is  isomorphic  to  the  finite  dimension  parameter  space  0„  with  elements 
0  =  (y1,a1,  •  •  •  ,aPn)    where   Yi=g(0)    and    a,  =  g"(x)   for  xe((i-\)hjh],   we   have 


H(e,fn)  <H(e,gn)<  H(e,Qn),  and  since 

1  Pn 

JU"(jc)|^=p-1X|a,|<A„ 

0  i=l 

Theorems  IX  and  X  of  Kolmogorov  and  Tihomirov  (1959)  imply, 

//(£,£,)  <2p„log(p,A)iog(l/e). 

Thus,  choosing  p„  =  n  v5/logn,  and  A„  =  log/i,  Condition  C3  is  satisfied  with  A3=  2/5, 
r0  =  1/10  and  r  =  0+,  observing  Shen  and  Wong's  convention  that  e~°    =  log  (1/e). 
Theorem  1  of  Shen  and  Wong(1992)  now  implies  that 

\\gn  -goh  =Op(max{n-X)  ,  ||jc„g0  -goh  .  ^farfo^o))). 
where  7i„£0  denotes  an  element  of  Qn,  K(gi,g0)  =  E[px(Y-g{)-  pT(Y-g0)  and 

u  =  2/5-loglogn/(21ogfl). 

Expanding  K(g  i,go)  around  g 0  we  obtain 

K(gl,go)<EfY\X(go(X))\gl(X)-g0(X)\2<M\\gl(X)-g0(X)\\l 
For  n  sufficiently  large,  Powell(1981,  Theorem  20.3)  establishes  that, 

l|rc„go-£olU^3/2p~2||go"ll~ 

Thus,  given  our  choices  of  pn  and  A„,  the  second  and  third  terms  in  the  max  expression 
are  Opin'^5  (logn)2).  Noting  that  n~v  =  n~2j5^\ogn  completes  the  proof.     □ 

An  essentially  identical  argument  would  yield  the  same  rate  of  convergence  for  the 
Loo  roughness  penalty  estimator. 

3.  PICTURES 

In  Figure  3.1  we  illustrate  three  different  quantile  smoothing  splines  estimated  using 
the  L  1  roughness  penalty.  The  data  is  the  well  known  motorcycle  data  described  for 
example  in  Hardle(1990).  In  Figure  3.2  we  illustrate  three  comparable  estimates  using 
the  Loo  penalty  for  the  same  problem.  The  piecewise  linearity  of  the  Lj  estimates  is 
already  apparent  in  these  figures.  Since  there  are  knots  at  each  design  point,  apparent 
kinks,  "elbows",  in  the  fitted  curve  correspond  to  large  values  of  the  (piecewise  constant) 
second  derivitive  on  short  intervals  between  adjacent  design  points.  In  Koenker,  Ng,  and 
Portnoy(1992)  we  contrast  these  estimates  with  those  from  a  kernel  estimate  of  the  con- 
ditional quantile  functions  and  conclude  that  the  splines  perform  considerably  better. 
Note  that  in  the  Loo  picture  the  three  estimated  quantile  functions  cross  at  the  penultimate 
point;  this  is  apparently  due  to  the  wide  separation  of  the  last  design  point  from  the  oth- 
ers, a  fact  that  has  prompted  other  investigators  to  omit  it  from  their  plots. 

In  Figures  3.3  and  3.4  we  illustrate  two  quantile  smoothing  spline  estimates  for  the 
data  appearing  in  Scheutte(1978)  which  is  a  typical  example  of  actuarial  "graduation"  of 
mortality  tables.  In  this  example  it  may  be  reasonable  to  impose  monotonicity  on  the 
fitted  curves,  so  we  contrast  estimates  based  on  both  L\  and  Loo  penalties  with  their 
monotonically  constrained  counterparts.  Note  that  since  the  first  derivitive  of  g  is  piece- 
wise  linear,  it  suffices  to  constrain  the  first  derivitive  at  each  of  the  knots  to  be  positive, 
which    requires   n   additional    linear   inequality   constraints.     Here   the    estimates   are 
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Figure  3.3      Median  Smoothing  Splines 

for  Schuette  Example:  L\  penalty 
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Figure  3.4      Median  Smoothing  Splines 

for  Schuette  Example:  LM  penalty 
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probably  somewhat  "undersmoothed"  and  the  piecewise  linearity  of  the  L  t  penalty  esti- 
mate is  somewhat  less  apparent. 

All  of  the  computing  was  carried  in  the  S  language  of  Becker,  Chambers,  and 
Wilks(1988).  The  underlying  algorithms  are  based  on  modifications  of  Bartels  and 
Conn(1980)  algorithm  for  constrained  l\  regression. 
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